Goto

Collaborating Authors

 learned function


The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Neural Information Processing Systems

We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results. However, we show theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies. Our results lead to specific predictions of the time it will take a network to learn functions of varying frequency. These predictions match the empirical behavior of both shallow and deep networks.


Reviews: The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Neural Information Processing Systems

What functions do NNs learn (approximate a function) and how fast are central questions in the study of the dynamics of (D)NNs. A common conception behind this problem is that if one trains a network longer than necessary, then the model might overfit. However, the definition of overfitting appears to vary from paper to paper. Moreover, overfitting is intimately linked with another hot topic in the area: over-parametrization. Please refer to "Advani & Saxe 2017 High Dimensional Dynamics of Gen Error for NNs" for a modern take on this link. Keeping in mind this link, we focus on fixed-size networks.


Reviews: The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Neural Information Processing Systems

It finds that lower frequencies learn first, and finds that biases allow for learning of odd frequencies. The restriction to spherical data is limiting, but the analysis and conclusions (particularly the rates of convergence) are novel and interesting.


The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Neural Information Processing Systems

We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results.


How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer

Heiss, Jakob, Teichmann, Josef, Wutte, Hanna

arXiv.org Artificial Intelligence

Randomized neural networks (randomized NNs), where only the terminal layer's weights are optimized constitute a powerful model class to reduce computational time in training the neural network model. At the same time, these models generalize surprisingly well in various regression and classification tasks. In this paper, we give an exact macroscopic characterization (i.e., characterization in function space) of the generalization behavior of randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs correspond to a Generalized Additive Model (GAM)-typed regression in which infinitely many directions are considered: the Infinite Generalized Additive Model (IGAM). The IGAM is formalized as solution to an optimization problem in function space for a specific regularization functional and a fairly general loss. This work is an extension to multivariate NNs of the results in [9], where we showed how wide RSNs with ReLU activation behave like spline regression under certain conditions and if the input dimension d = 1.


The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Ronen, Basri, Jacobs, David, Kasten, Yoni, Kritchman, Shira

Neural Information Processing Systems

We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results.